Skip to content

Conversation

dhtclk
Copy link
Collaborator

@dhtclk dhtclk commented Jul 28, 2025

Summary

Adding a troubleshooting page to create a space for useful links based on support data. Hopefully boosting discoverability of some existing quality docs and kb articles.

@dhtclk dhtclk requested a review from a team as a code owner July 28, 2025 20:56
@dhtclk dhtclk linked an issue Jul 28, 2025 that may be closed by this pull request
Copy link

vercel bot commented Jul 28, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
clickhouse-docs Ready Ready Preview Comment Aug 21, 2025 5:26pm
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
clickhouse-docs-jp Ignored Ignored Aug 21, 2025 5:26pm
clickhouse-docs-ru Ignored Ignored Preview Aug 21, 2025 5:26pm
clickhouse-docs-zh Ignored Ignored Preview Aug 21, 2025 5:26pm

@Blargian
Copy link
Member

@dhtclk LGTM, just two small things to change:

Screenshot 2025-07-29 at 11 31 31

Could we make it just a single menu item rather than a dropdown? I think the expanding tab is only necessary if we have more than one page.

If we're adding here we should probably add it as an item in the top menu:

Screenshot 2025-07-29 at 11 32 37

Copy link
Member

@Blargian Blargian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this in soon, just a few small changes to make - see comments.

- [Session memory settings](/docs/operations/settings/settings)
<br/>
### Scaling and sizing: {#scaling-and-sizing}
- [Right-size your service](/docs/operations/tips)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea, but some of these like this one are only really applicable to OSS ClickHouse. Maybe we should mark them with a ⛁ vs ☁️ if they are specific

@dhtclk
Copy link
Collaborator Author

dhtclk commented Aug 6, 2025

@kellytoole I know you also said you wanted to take a look at this. There's a first draft of the "lessons learned" doc in this PR along with the troubleshooting section.

- SQL flexibility enabled complex rate limiting rules beyond simple counters
- Leveraged existing data pipeline instead of requiring separate infrastructure

## ClickHouse for customer analytics {#customer-analytics}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really a creative use case? It looks like standard RTA to me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, maybe not. This may be an example of my limited experience with ClickHouse thus far. We could either reframe/re-title the document to something like Customer Success Stories vs Creative Use-cases, or I can find a different example.

- Customers could create their own segments and slice data freely
- No more engineering bottlenecks for new analytical requirements

```sql runnable editable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a wider issue, but I noticed the scroll stop working when hovering the code editor.


In Microsoft's web analytics system, search results trigger different types of answers - weather cards, sports information, news articles, and factual responses. Each query result was tagged with descriptive strings like "weather_answer," "sports_answer," or "factual_answer." With billions of search queries processed, these string values were being stored repeatedly in ClickHouse, consuming massive amounts of storage space and requiring expensive string comparisons during queries.

Microsoft implemented a string-to-integer mapping system using a separate MySQL database. Instead of storing the actual strings in ClickHouse, they store only integer IDs. When users run queries through the UI and request data for `weather_answer`, their query optimizer first consults the MySQL mapping table to get the corresponding integer ID, then converts the query to use that integer before sending it to ClickHouse.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the mapping solution could be implemented using a Dictionary here instead of MySQL. I understand we want to share the story as-is from the customer, but maybe we could suggest "better" solution if exists in ClickHouse.


## Partition-based data management {#partition-management}

Microsoft Clarity discovered that partitioning strategy impacts both performance and operational simplicity. Their approach: partition by date, order by hour. This strategy delivers multiple benefits beyond just cleanup efficiency—it enables trivial data cleanup, simplifies billing calculations for their customer-facing service, and supports GDPR compliance requirements for row-based deletion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to resources on how to manage partitions in ClickHouse


Instead of looking at average performance, identify specific query patterns that cause problems:

```sql runnable editable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What am I suppose to achieve? Faster query? Less resource used?


**Universal pain point:** Small frequent inserts create performance degradation through part explosion.

## Recognize the Problem Early {#recognize-parts-problem}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide context on how to use this query

<details>
<summary><strong>Show performance and error solutions</strong></summary>

### Query performance {#query-performance}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

color issue on light theme

description: 'Find solutions to the most common ClickHouse problems including slow queries, memory errors, connection issues, and configuration problems.'
---

# Troubleshooting Common Issues {#troubleshooting-common-issues}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great resource page. Why is it under Community wisdom section?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was placed here as a sort of a catch all for tips and tricks, can definitely look into moving it. I'm hoping with search improvements and such users will find it via that or contextual navigation vs navigating the tree.

If you can't find a solution:

1. **Ask AI** - <KapaLink>Ask AI</KapaLink> for instant answers.
1. **Check system tables** - Run `SELECT * FROM system.processes` and `SELECT * FROM system.query_log ORDER BY event_time DESC LIMIT 10`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an existing section on how to check system tables? Just two queries lack a bit of context on what to do with the results of those queries.

Also running SELECT * FROM system.processes without limit on an overloaded system might not be the best idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linked to system tables overview instead


**Prevention:**

```sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this query prevents locking the database in case of expensive mutation. This query simply monitor the progress of on-going mutations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edited to monitoring mutations


[Official recommendation](/best-practices/selecting-an-insert-strategy#batch-inserts-if-synchronous): minimum 1,000 rows per insert, ideally 10,000 to 100,000.

### Data quality issues {#data-quality-issues}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to invalid timestamps issues.


### Disk space problems {#disk-space-problems}

Disk space exhaustion in replicated setups creates cascading problems. When one node runs out of space, other nodes continue trying to sync with it, causing network traffic spikes and confusing symptoms. One community member spent 4 hours debugging what was simply low disk space.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a simple query we can share here to detect remaining disk space on each node?


By default, distributed tables use single-threaded inserts. Enable `insert_distributed_sync` for parallel processing and immediate data sending to shards.

Monitor temporary data accumulation when using distributed tables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we share a query to monitor the temporary data accumulation?

@dhtclk dhtclk merged commit c0cf844 into main Aug 21, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Troubleshooting Section Identify common user pain points from meet up presentations
3 participants